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Abstract. The study of meme propagation and the prediction of meme 
trajectory are emerging areas of interest in the field of complex networks 
research. In addition to the properties of the meme itself, the struc¬ 
tural properties of the underlying network decides the speed and the 
trajectory of the propagating meme. In this paper, we provide an ar¬ 
tificial framework for studying the meme propagation patterns. Firstly, 
the framework includes a synthetic network which simulates a real world 
network and acts as a testbed for meme simulation. Secondly, we propose 
a meme spreading model based on the diversity of edges in the network. 
Through the experiments conducted, we show that the generated syn¬ 
thetic network combined with the proposed spreading model is able to 
simulate a real world meme spread. Our proposed model is validated by 
the propagation of the Higgs boson meme on Twitter as well as many 
real world social networks. 


1 Introduction 

“We ape, we mimic, we mock, we act” is a law universal to all human beings. 
Imagine a lady in an elevator, heading to the fifth floor of her office. Suddenly, 
one by one, every person in the elevator turns back, what does she do now? 
According to Elevator Groupthink psycology experiment [T], most of us would 
turn back in such a situation. Usually, most of us become followers of the crowd 
when faced with our sense of conformity. If ants follow each other with the help 
of the pheromone trail, humans too involuntarily imitate and follow each others’ 
behaviours and ideas. Behaviours like obesity, smoking and altruism are also seen 
to spread through social networks [2]. Today, Online Social Networks(OSNs) like 
Facebook and Twitter provide a platform to fulfill people’s penchant for informa¬ 
tion sharing, arguing and mudslinging. Used by approximately 1.4 billion people 
worldwide 0 , Facebook’s “Read, Like and Share” tradition has today become a 
way of living. Understanding these spreading phenomena can help us in diverse 
ways such as accelerating the spread of useful information i.e. health related 
advices or disaster management related announcements as well as for viral mar¬ 
keting of products and memes. Predicting the trajectory of a meme’s propagation 
in a network can also prevent the spread of malicious rumors and misinformation. 
Social networks play an instrumental role in the spread of influence in today’s 


world. Hence, contagion prediction models are an extensively studied field in 
complex networks research. Such models evolve frequently with time, aiming 
to depict real world information propagation more accurately. Initially, meme 
propagation models were inspired from compartmental epidemiological models 
[3]. These models [5] were too simplistic and did not consider the role of edges 
in the spreading of information. Later on, the advent of independent cascade [B] 
and linear threshold models [7] proved seminal and these became the standard¬ 
ised models for meme propagation. However, most of these models did not take 
into consideration the network structure and the calculation of parameters for 
these models also remained a challenge. 

Consider an anecdote about a small child Bob who went to visit the theme 
park. Six Flags Magic Mountain in California, with his parents. Bob got lost in 
the Fright Fest, which is the biggest and most terrifying maze at the theme park 
known for its complex spider-web like structure. Confused by the many turns the 
maze took at every step, poor Bob could not find his way out of the Fright Fest. 
When Bob did not return, his worried parents contacted the park authorities 
for help. These authorities having complete knowledge about the structure of 
the maze and the possible paths that could be traversed by the players, could 
easily locate Bob. Similarly, real world networks also have a complex yet distinct 
structure and if one could understand this structure and estimate the paths that 
can be taken by the meme in its trajectory, could she also not behave like the 
park authorities in the above analogy? In connection with the above anecdote, 
the knowledge of a network’s structure is important for understanding meme 
propagation. It is known that the real world social networks have a very well 
defined structure. We employ this well known structure for the simulation of a 
meme. 

The major contributions of the paper are : 

1. Generation of an artificial synthetic network that mimics a real world social 
network in terms of network structure. 

2. We propose a spreading model for meme propagation based on the structure 
of the network. This model is based on the difference in spreading probabil¬ 
ities of different edges which is recognised from the network itself. 

The proposed synthetic network and spreading model give a synthetic simula¬ 
tion environment which serves as a test-bed to study meme propagation patterns. 
Further, it gives a way of organising the edges in an hierarchy based on vary¬ 
ing probabilities of information transmission across these edges. We validate the 
proposed spreading model against the real world spreading of the Higgs boson 
meme on Twitter. If one could extract the structure of offline social networks, 
our framework can be used for understanding a wide range of phenomena on 
offline networks as well in addition to online networks. In addition to controlling 
information flow on OSNs, we can decrease the increasing behavioral spreading 
of obesity and depression in the world and promote altruism and positive move¬ 
ments. Inspired from the diversity of edges in a social network, the paper lays 
light on a novel aspect of looking at information propagation. 


The rest of the paper is organised as follows: Section 2 describes the related 
work. Section 3 explains the synthetic networks in addition to describing the 
real world networks used for simulation. The network structure based spreading 
model is proposed in Section 4. Section 5 is devoted to results and discussion. 
Section 6 illustrates the extension of our model as well as future possibilities. 
Finally, the paper is concluded in Section 7. 


2 Related Work 


An enormous amount of work has been done to study the information prop¬ 
agation pattern on an online social network m- Initially, memes in a social 
network were considered analogous to a virus in a biological network |4]. As a 
result, most information spreading models were inspired from compartmental 
epidemiological models like SIS and SIR models [S] introduced in 1989. However 
these models assumed a homogenous mixing of people constituting the popula¬ 
tion and did not take into account interactions between the individuals. Later, 
independent cascade(IC) [6] and linear threshold(LT) [7] models were investi¬ 
gated which are now used as the standard models for information propagation 
m- However, these models did not consider factors like network structure and 
model simulation parameters. There were some studies that predicted the pa¬ 
rameters associated with the information propagation models m, but these are 
largely based on the utilisation of the past data, obtaining which is a difficult 
process. Studying considering the impact of network structure on a memo’s prop¬ 
agation provide a relative view of the meme spread. For example, the spread of 
epidemics is faster on scale free networks as compared to the random networks 
due to the presence of hubs [T^]. Zhang et al. presents a stochastic model for 
the information propagation phenomenon m- Studying the information prop¬ 
agation may help the scientists in a number of ways like halting the spread of 
misinformation [14j and accelerating useful information m through a network. 

Meme Virality prediction is an active research area in social network analy¬ 
sis mm and meme propagation models can be used extensively in fields like 
Viral Marketing. Viral Marketing can be done by targeting a set of nodes in 
a network as done by Kleinberg et al in their paper on influence maximisation 
mm- Influential spreaders play a significant role in information propagation 
as shown by Kitsak et al in their work [5D]. Meme virality can not only depend 
on network structure and nodes in a network, it seems to be intuitive that meme 
content also has a role to play in the meme becoming viral mm- Though most 
studies consider nodes in their study of meme virality, we consider the property 
of edges in the spread of information. An edge connecting a vulnerable node to 
an influential node may have more impact as compared to a vulnerable node 
to another. Our study takes the diversity of edges into consideration and then 
probes into the meme pattern that can be formed. 








3 Generation of Networks for Meme Simulation : SCCP 
Networks 

It has been observed that most of the social networks are scale free and can be 
generated by the preferential attachment model. Further, these networks have 
communities because of the phenomenon of homophily that leads to the for¬ 
mation of dense clusters in the network. We also consider one more meso scale 
characteristic in the formation of network- core-periphery structure. It has been 
shown that the scale free networks possess an implicit core-periphery structure. 
Considering these 3 characteristics, we have tried to simulate real world networks 
via SCCP networks which show properties like Scale-free structure, presence of 
Communities and Core-Periphery structure. We introduce a modification to the 
algorithm m employed by Wu et al. to generate these synthetic networks. 

The algorithm is described below: 

Input: 

1. k= Number of communities in the network 

2. s= Initial number of nodes in each community 

3. ti= Number of new nodes to be added in community i where 0 < i < A: — 1 

4. t = max{ti) Vi 

5. /= Fraction of edges each incoming node makes in its own community. 0.5 < 

/<1 

6- ?’ 2 ]= Range for the number of edges an incoming node can make. ri and 

r 2 decides the density of a graph 

Output: 

1. G= The SCCP network formed from input parameters specified above 


Algorithui 1 Algorithm for generating a SCCP network 

1: We start the formation of the network G with k cliques each of size s. 

2: We perform t iterations. In each iteration, we add one node to every community 
except for the communities in which ti number of new nodes have already been 
added. Following two steps are performed in order to add a new node. 

— Each newly arriving node chooses a random number m in the range [ri,r 2 ]. 
Then, it makes m edges with the already existing nodes. 

— The newly arrived node makes fm edges with the nodes in its own community 
and (1 — f)m edges with the nodes in other communities. All the edges are 
created in a preferential attachment manner. 

3: We detect core nodes in the generated graph by using k-shell decomposition algo¬ 
rithm [30]. We declare the core nodes to form a separate community of their own. 
So the total number of communities in the network now becomes A: -|- 1. 


In our algorithm, we relax the condition of every node making equal number 
of edges on its arrival. It is because, when a new person joins a social network, 





it is not compulsory for him/her to make a predefined number of friends. The 
number of friendships vary from person to person. Moreover, the above algorithm 
allows the formation of communities of different sizes. 



Fig. 1: Generation of a SCCP network 


One iteration of this algorithm is illustrated in figure 1, where fc = 3, s = 4, 
ti = 1 Vj, rl = 2, r2 = 6 , and / = 0.7. The network starts with 3 communities, 
each of which is a clique of 4 nodes. Communities 1, 2 and 3 are represented by 
red, blue and green colours respectively. Next, we add 3 nodes, one node to each 
community. 

— First node is added in community 1. It chooses a random number 4. Then it 
makes 3 intra community edges and 1 inter community edge. 

— Second node is added in community 2. It chooses random number 5. Then 
it makes 3 intra community edges and 2 inter community edges. 

— The last node is added in community 3. It chooses random number 3. Then 
it makes 2 intra community edges and 1 inter community edge. 

— Next, we detect core-periphery structure in the generated network by using 
k-shell decomposition 

4 Proposed Spreading Model 

Meme propagation on a real world network follows the pattern of a complex con¬ 
tagion. Unlike a simple contagion, the spreading pattern of a complex contagion 
depends on factors like homophily and social reinforcementj^ A simple contagion 

^ Homophily is the name given to the tendency of similar people becoming friends 
with each other. This leads to more number of ties between like minded people and 






is like an infectious disease which spreads with equal probability across all the 
edges, while a complex contagion spreads with different probabilities depending 
on the factors like social reinforcement and homophily m In addition, user 
influence also plays a prominent role in meme propagation. We take into consid¬ 
eration all these factors in modeling the diffusion of a meme permeating through 
the ties in the network. 

Our model is based on two key ideas : 

1. Diversity in Tie Strength : “Birds of the same feather flock together”. We 
are more engaged and connected with the people in our own community 
as compared to people from other communities |23) . Hence, the probability 
associated with the edges connecting people of the same community should 
be higher than the edges connecting people of different communities. This 
observation gains motivation from the theory of weak ties |24j . 

2. The social status of nodes : The social influence of a person in a network 
plays a big role in acceptance of information propagated by that person. A 
person’s social status also decides if that person is vulnerable to adopting 
information. Simply stated, lower the status, higher the vulnerability and 
vice versa. Higher the status, more the influence and vice versa. 

Because of the presence of core-periphery structure in SCCP networks, there 
are two kinds of nodes in a SCCP network: core nodes and periphery nodes 
(periphery nodes are further divided into many communities). Initially, all the 
nodes are uninfected and a node turns infected as soon as it adopts a meme. We 
call an infected node u, the sender and an uninfected neighbour of u say v, the 
receiver of an infection. The probability of infection transmission across an edge 
depends on the types of both nodes - the sender and the receiver. In our model, 
the probabilities of infection across edges are divided into five categories : 

P P P P P 

^ cci cp") ^ pc f PPq-) ppi 

Here, ‘P’ represents probability. The type of edge is represented by the sub¬ 
script. The subscript’s first alphabet denotes the type of sender node and sec¬ 
ond alphabet denotes the type of receiver node, ‘c’ represents core, ‘p’ rep¬ 
resents periphery. 0 in the subscript denotes same community membership of 
sender and the receiver node, while 1 represents sender and receiver belong¬ 
ing to different communities. We worked towards predicting the most plau¬ 
sible order for these edge probabilities, which is initially proposed to be as : 
Pec ^ Pep ^ Pppo P Pppi P Ppc- 

Our model can be considered as an extension of the simple cascade model, with a 
slight change in the definition in every iteration, each infected node tries infect¬ 
ing its uninfected neighbours in accordance with the above probability hierarchy. 


hence leads to the formation of communities in the network. Social reinforcement is 
the phenomenon by which multiple exposures of an information to a person leads to 
him adopting it. Social reinforcement and homophily tend to block the information 
inside one community. 



5 Datasets 


We have used multiple SCCP networks, random graphs [53] and real world net¬ 
works [31] in our study. We have considered the two most widely used online 
social networks- Facebook and Twitter having approximately 1371 and 271 mil¬ 
lion users. For comparing our complete framework, we use the Higgs boson meme 
propagation information on Twitter (datasetl). The dataset 1 gives a complete 
picture of a meme spreading on an online social network along with the infor¬ 
mation “who infected whom at every step.The datasets’ specifications are given 
below: 

1. Dataset 1: Dataset 1(a): This dataset is an induced directed unweighted 
subgraph on Twitter users who were involved in any of the activities(reply, 
retweet, or mention) regarding the Higgs boson mem^ [2b] . It is an undi¬ 
rected unweighted graph containing 456631 nodes and 14855875 edges. 
Dataset 1(b): This is a directed weighted graph between the Twitter users 
who were involved in retweeting m of the Higgs Boson meme. There is an 
edge from B to A if A retweets B. This graph contains 425008 nodes and 
733647 edges. In datasets 1(a) and 1(b), the tweets posted in Twitter about 
this discovery between 1st and 7th July 2012 are considered. 

2. Dataset 2: This dataset is an undirected unweighted induced subgraph on 
Facebook with 4039 nodes and 88234 edges [55| . 

3. Dataset 3: This dataset is an induced undirected unweighted subgraph on 
Twitter with 81306 nodes and 1768149 edges [53] . 

4. Dataset 4: These datasets have been derived from the algorithm proposed 
in the previous section. 

Dataset 4(a): This is a SCCP network on 65800 nodes, 591750 edges and 
11 communities. 

Dataset 4(b): This is a SCCP network on 4000 nodes, 170314 edges and 
11 communities. 

5. Dataset 5: This is an Erdos-Renyi graph on 4000 nodes and 34650 edges. 

We detect communities in datasets 1(a), 2 and 3 using fast greedy modu¬ 
larity optimization algorithm. This algorithm is given by Newman et. al. [25] 
and is used to detect community structure for very large graphs. We also find 
out the core-periphery structure for all the above listed datasets using k-shell 
decomposition algorithm. We assign a coreness value to each node equal to the 
shell value assigned to it by the algorithm. Then, we pick top 10% of the nodes 
having highest coreness values and call them the core nodes. The remainder of 
the nodes are termed periphery nodes. 


^ Higgs boson is one of the most elementary elusive particle in modern physics. A 
meme in Twitter is considered to be a Higgs Boson meme if it contains at least one 
of these keywords or tags: Ihc, cern, boson, higgs 



6 Experiments and Results 

6.1 Spreading Model Validation 

Our model was validated using datasets 1(a) and 1(b), where 1(a) gives us the 
information about the structure of a social network and 1(b) is the cascading 
pattern of a meme over 1(a). Let the dataset 1(a) be represented by G{V,E). 
Based on the structure of G, we partition its nodes in two subsets G and P. G 
is the set of core nodes and P is the set of periphery nodes such that GUP = V 
and C n P = 0. We also associate a variable Sij with each edge Eij. Sij = 1 if 
nodes i and j belong to the same community, else 0. We divide the edges in the 
retweet network (dataset 1(b)) in four categories based on the types of users an 
edge is connecting. These categories are as follows:- 

1- Ecc = {Eij G E : {i G C) A (j G C)} 

2. E,p = {E,j GE-.{iGG)h{j G P)} 

3. Ep, = {E,j GE:iiGP)AijG C)} 

4. Epp = {Eij G E : (i G P) A {j G P)} 

~ Epp^ = {E,j GE:(iGP)A{jGP)A = 1} 

- Epp^ = {Py G E : (iG P) A{j G P) A = 0} 

The types of nodes for 1(b) are extracted from its main graph 1(a). 

In retweet networks, the weight of an edge from A to B specifies the amount 
of information flowing from A to B (number of times B retweeted a message from 
A). Therefore, more the weight, higher the probability of information transmis¬ 
sion across that edge. We calculate the following weights from the above graphs:- 
Let W{Eij) be the weight of an edge from node i to node j and N^y represent 
the type of edges E^y where x and y are the types of nodes hence having the 
possible values p and c. Then, we calculate W^cy ,the sum of weights of all the 
edges from a node of type a: to a node of type y. 

1- Wcc = /^cc such that Eij G E^c 

2. Wcp = Y{(y^{Eij))/Ncp such that Eij G Ecp 
3- Wpc = J2(y^{Eij))/Npc such that Eij G Ep^ 

4. Wpp = J20^iEij))/Npp I such that Eij G Epp 

- Wpp^ = Y,{W{E,j))/Nppg such that Eij G Epp^ 

- Wpp^ = J2{^{E^j))/Npp^ such that Eij G Epp-^^ 

The weights obtained show that the observed order is the same as we have 
proposed earlier thereby validating the ordering we proposed i.e. Wcc > Wcp > 

^^ppo ^ ^^ppi ^ bkpc* 

6.2 Simulation Results 

We introspect on the extent as well as rate of infection of the network, while 
propagating a meme on it. We simulate EBH as well as uniform spreading model 



on a number of datasets and report the results. For the simulation of our pro¬ 
posed model, we use the following probabilities: Epc : 0.00001, Epp^ : 0.0003, 
Epp^ : 0.0001, Ecc ■ 0.006, and E^p : 0.004. For the simulation of uniform spread¬ 
ing model, every edge is considered to have an equal probability of infection 
i.e. Eij = 0.0002, where i and j are the endpoints of an edge. We have chosen 
these probabilities such that we can visualise the spreading pattern of a meme 
to the best possible extent. For all the figures in this section, X axis represents 
the number of iterations and Y axis represents the cumulative number of nodes 
infected up to that iteration. The results of this paper are structured in three 
parts : 



Fig. 2: Spreading patterns on different kinds of networks and its comparison to real world data - 1 : 
Spreading patterns on datasets 3 and 4(a) 2 : Spreading pattern on dataset 1(a) 3 : Actual spreading 
pattern of Higgs boson meme(dataset 1(a) and 1(b)) 4 - Comparison between the proposed spreading 
model on datasets 2, 4(a), and 5 5 : Spreading patterns for dataset 4(a) 6 : Proposed and uniform 
spreading models on dataset 3 


Meme spreading patterns on different networks using the EBH and 
uniform spreading models Figure 1.2 shows the actual spreading pattern 
of the Higgs boson meme which indicates that in the real world, a meme does 
not have a constant growth rate. The rate remains constant upto some point, 
after which the popularity of a meme shoots up steeply and then slowly fades, 
giving rise to a sigmoid curve which is characterised by the equation : E(x) = 
1/(1 -I- Figure 1.1 shows the simulation of our proposed spreading model 

on the SCCP network and two real world networks of Facebook and Twitter. 
It can be seen that in both these cases, the curve for the spreading pattern is 
seen to be sigmoidal just like figure 1.2. Figure 1.4 shows the difference in the 
spreading patterns when the simulation is done through an uniform spreading 
model and our model respectively. It can be seen that the simulation through 
an uniform spreading model is also a sigmoid function but has a lesser value of 
parameter x. Figure 1.3 shows the simulation of the proposed spreading model 
on 3 different kinds of networks. Despite simulating the EBH spreading model 
on all the three graphs, the value of x is observed to be lower only in the case of 
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Fig. 3: Spreading patterns starting from different types of seed nodes and its comparison to real 
world data - 1 : Proposed spreading model on dataset 4(a) where spreading starts from periphery 
nodes 2 : Proposed spreading model on dataset 4(a) where spreading starts from core nodes 3 : Actual 
spreading pattern for the Higgs boson meme(from dataset 1(a) and 1(b)) 4 ' Spreading patterns 
starting from single community 5:Spreading patterns starting from multiple communities 6 : Actual 
spreading pattern for the Higgs boson meme(from dataset 1(a) and 1(b)) 






































random network^ Thus we can say that the sharp S shaped infection pattern 
is observed only for the SCCP kind of networks. These graphs show that the 
presence of both- a SCCP kind of network as well as EBH spreading model are 
required to mimic a real world meme propagation. Some similar results have 
been described in the attached appendix B. 


Explanation of the plateau structure observed in the meme pattern 

Figure 2.3 shows the pattern of infection of core nodes and periphery nodes 
for the actual Higgs boson meme. As in the previous case, all iterations are 
considered to be of equal length (10 timestamps). We observe the cumulative 
number of core nodes and periphery nodes infected in every iteration. When 
we started infection from periphery nodes(figure 2.1), the plateau structure of 
the curve continues till a core node is infected and then the infection shoots 
up suddenly. Figure 2.2 shows the plot when the infection is started only from 
the core nodes. We can see that in this case, infection shoots up immediately 
without the plateau structure. This solidifies the observation that the number of 
periphery nodes infected increases sharply as soon as a sufficient fraction of the 
core nodes gets infected. 


Effect of communities and core nodes on meme virality In figure 2.4, we 
start the infection from a single community and show that the infection spreads 
in multiple communities only when the meme infects the core sufficiently and 
gets viral. Figure 2.5 shows the spreading pattern when the infection starts from 
multiple communities. But the meme becomes viral only after the infection of 
core nodes. So, whether the infection starts from single community or multiple 
communities, the infection of core nodes is sufficient to predict its virality. Figure 
2.6 shows the actual spreading pattern of Higgs boson meme. 

7 Conclusion and Future Work 

A lot of researchers are working towards proposing the models that can predict 
the pattern of meme spread in a real world network today. A number of models 
have been proposed for this ranging from simple epidemiological models to the 
standard models like Linear Threshold and Independent Cascade. Most of these 
models do not give an approach to identify the parameters required to simu¬ 
late them. Moreover, they are proposed for all kind of networks though they 
can be improved upon and specialised for a particular kind of network. Hence, 
improving these models to better simulate a meme propagation is possible. It 
is shown that, together, SCCP and EBH models effectively simulate real world 

® In the case of random network, even though the declared 10% core nodes have a high 
probability of infecting their neighbours, the connections between the core nodes are 
not dense enough to result in an overshoot in the number of infected nodes. So, 
absence of a distinct core-periphery structure in such networks make them invalid 
for our framework 



meme propagation. The sigmoid curve with a sharp slope is shown to be the 
characteristic pattern of an internet meme. Furthermore, the importance of core 
nodes in marking the virality of a meme is emphasised. It is also shown that in¬ 
fecting multiple communities also require the infection of core nodes. The study 
is validated with the Higgs boson meme spreading on Twitter in addition to var¬ 
ious other real world networks. This study opens a new direction of considering 
edge diversity in meme propagation models. 

One can extend our problem to predict the exact values of the probabilities in¬ 
fluencing the meme propagation. This can greatly help in prediction of a future 
cascade pattern. If such cascades could be predetermined then we could exert a 
control on our otherwise ever changing social networks. Not only could preven¬ 
tive checkpoints be placed in the network but also useful information could be 
accelerated through the network by using the predicted meme trajectory. 
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